Goto

Collaborating Authors

 code explanation



LeDex: Training LLMs to Better Self-Debug and Explain Code

Neural Information Processing Systems

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose LeDex, a training framework that significantly improves the self-debugging capability of LLMs. Intuitively, we observe that a chain of explanations on the wrong code followed by code refinement helps LLMs better analyze the wrong code and do refinement. We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanations and refinement trajectories from the LLM itself or a larger teacher model and filtering via execution verification. We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality.


Scaffolding Metacognition in Programming Education: Understanding Student-AI Interactions and Design Implications

Ma, Boxuan, Li, Huiyong, Li, Gen, Chen, Li, Tang, Cheng, Xie, Yinjie, Gu, Chenghao, Shimada, Atsushi, Konomi, Shin'ichi

arXiv.org Artificial Intelligence

Generative AI tools such as ChatGPT now provide novice programmers with unprecedented access to instant, personalized support. While this holds clear promise, their influence on students' metacognitive processes remains underexplored. Existing work has largely focused on correctness and usability, with limited attention to whether and how students' use of AI assistants supports or bypasses key metacognitive processes. This study addresses that gap by analyzing student-AI interactions through a metacognitive lens in university-level programming courses. We examined more than 10,000 dialogue logs collected over three years, complemented by surveys of students and educators. Our analysis focused on how prompts and responses aligned with metacognitive phases and strategies. Synthesizing these findings across data sources, we distill design considerations for AI-powered coding assistants that aim to support rather than supplant metacognitive engagement. Our findings provide guidance for developing educational AI tools that strengthen students' learning processes in programming education.


Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding

Nevo, Ziv, Raz, Orna, Yorav, Karen

arXiv.org Artificial Intelligence

Abstract--Understanding the purpose of source code is a critical task in software maintenance, onboarding, and modernization. While large language models (LLMs) have shown promise in generating code explanations, they often lack grounding in the broader software engineering context. We propose a novel approach that leverages natural language artifacts from GitHub--such as pull request descriptions, issue descriptions and discussions, and commit messages--to enhance LLM-based code understanding. Our system consists of three components: one that extracts and structures relevant GitHub context, another that uses this context to generate high-level explanations of the code's purpose, and a third that validates the explanation. We implemented this as a standalone tool, as well as a server within the Model Context Protocol (MCP), enabling integration with other AI-assisted development tools. Our main use case is that of enhancing a standard LLM-based code explanation with code insights that our system generates. T o evaluate explanations' quality, we conducted a small scale user study, with developers of several open projects, as well as developers of proprietary projects. Our user study indicates that when insights are generated they often are helpful and non trivial, and are free from hallucinations. Modern software systems are increasingly complex, and understanding the purpose of existing code is a major challenge in software maintenance, on-boarding, and modernization.



LeDex: Training LLMs to Better Self-Debug and Explain Code

Neural Information Processing Systems

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose LeDex, a training framework that significantly improves the self-debugging capability of LLMs.


Can LLMs Identify Gaps and Misconceptions in Students' Code Explanations?

Oli, Priti, Banjade, Rabin, Olney, Andrew M., Rus, Vasile

arXiv.org Artificial Intelligence

This paper investigates various approaches using Large Language Models (LLMs) to identify gaps and misconceptions in students' self-explanations of specific instructional material, in our case explanations of code examples. This research is a part of our larger effort to automate the assessment of students' freely generated responses, focusing specifically on their self-explanations of code examples during activities related to code comprehension. In this work, we experiment with zero-shot prompting, Supervised Fine-Tuning (SFT), and preference alignment of LLMs to identify gaps in students' self-explanation. With simple prompting, GPT-4 consistently outperformed LLaMA3 and Mistral in identifying gaps and misconceptions, as confirmed by human evaluations. Additionally, our results suggest that fine-tuned large language models are more effective at identifying gaps in students' explanations compared to zero-shot and few-shot prompting techniques. Furthermore, our findings show that the preference optimization approach using Odds Ratio Preference Optimization (ORPO) outperforms SFT in identifying gaps and misconceptions in students' code explanations.


From Critique to Clarity: A Pathway to Faithful and Personalized Code Explanations with Large Language Models

Xu, Zexing, Luo, Zhuang, Li, Yichuan, Lee, Kyumin, Etesami, S. Rasoul

arXiv.org Artificial Intelligence

In the realm of software development, providing accurate and personalized code explanations is crucial for both technical professionals and business stakeholders. Technical professionals benefit from enhanced understanding and improved problem-solving skills, while business stakeholders gain insights into project alignments and transparency. Despite the potential, generating such explanations is often time-consuming and challenging. This paper presents an innovative approach that leverages the advanced capabilities of large language models (LLMs) to generate faithful and personalized code explanations. Our methodology integrates prompt enhancement, self-correction mechanisms, personalized content customization, and interaction with external tools, facilitated by collaboration among multiple LLM agents. We evaluate our approach using both automatic and human assessments, demonstrating that our method not only produces accurate explanations but also tailors them to individual user preferences. Our findings suggest that this approach significantly improves the quality and relevance of code explanations, offering a valuable tool for developers and stakeholders alike.


Exploring LLM-Driven Explanations for Quantum Algorithms

d'Aloisio, Giordano, Fortz, Sophie, Hanna, Carol, Fortunato, Daniel, Bensoussan, Avner, Usandizaga, Eñaut Mendiluze, Sarro, Federica

arXiv.org Artificial Intelligence

Background: Quantum computing is a rapidly growing new programming paradigm that brings significant changes to the design and implementation of algorithms. Understanding quantum algorithms requires knowledge of physics and mathematics, which can be challenging for software developers. Aims: In this work, we provide a first analysis of how LLMs can support developers' understanding of quantum code. Method: We empirically analyse and compare the quality of explanations provided by three widely adopted LLMs (Gpt3.5, Llama2, and Tinyllama) using two different human-written prompt styles for seven state-of-the-art quantum algorithms. We also analyse how consistent LLM explanations are over multiple rounds and how LLMs can improve existing descriptions of quantum algorithms. Results: Llama2 provides the highest quality explanations from scratch, while Gpt3.5 emerged as the LLM best suited to improve existing explanations. In addition, we show that adding a small amount of context to the prompt significantly improves the quality of explanations. Finally, we observe how explanations are qualitatively and syntactically consistent over multiple rounds. Conclusions: This work highlights promising results, and opens challenges for future research in the field of LLMs for quantum code explanation. Future work includes refining the methods through prompt optimisation and parsing of quantum code explanations, as well as carrying out a systematic assessment of the quality of explanations.


Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering

Pordanesh, Saman, Tan, Benjamin

arXiv.org Artificial Intelligence

This study investigates the capabilities of Large Language Models (LLMs), specifically GPT-4, in the context of Binary Reverse Engineering (RE). Employing a structured experimental approach, we analyzed the LLM's performance in interpreting and explaining human-written and decompiled codes. The research encompassed two phases: the first on basic code interpretation and the second on more complex malware analysis. Key findings indicate LLMs' proficiency in general code understanding, with varying effectiveness in detailed technical and security analyses. The study underscores the potential and current limitations of LLMs in reverse engineering, revealing crucial insights for future applications and improvements. Also, we examined our experimental methodologies, such as methods of evaluation and data constraints, which provided us with a technical vision for any future research activity in this field.